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This paper addresses the design issues of the multi-antenna-based cognitive radio (CR) system that 
is able to operate concurrently with the licensed primary radio (PR) system. We propose a practical CR 
transmission strategy consisting of three major stages: environment learning, channel training, and data 
transmission. In the environment learning stage, the CR transceivers both listen to the PR transmission 
and apply blind algorithms to estimate the spaces that are orthogonal to the channels from the PR. 
Assuming time-division duplex (TDD) based transmission for the PR, cognitive beamforming is then 
Q , designed and applied at CR transceivers to restrict the interference to/from the PR during the subsequent 

channel training and data transmission stages. In the channel training stage, the CR transmitter sends 
training signals to the CR receiver, which applies the linear-minimum-mean-square-error (LMMSE) 

O : 

\& , based estimator to estimate the effective channel. Considering imperfect estimations in both learning 



and training stages, we derive a lower bound on the ergodic capacity achievable for the CR in the data 
transmission stage. From this capacity lower bound, we observe a general learning/training/throughput 



in 
o 

tradeoff associated with the proposed scheme, pertinent to transmit power allocation between training 



and transmission stages, as well as time allocation among learning, training, and transmission stages. 



We characterize the aforementioned tradeoff by optimizing the associated power and time allocation to 
■ maximize the CR ergodic capacity. 

Index Terms 

Cognitive radio, spectrum sharing, multi-antenna systems, environment learning, channel training. 



F. Gao is with the School of Engineering and Science, Jacobs University Bremen, Campus Ring 1, Bremen, Germany, 28759 
(Email: feifeigao@ieee.org). 

R. Zhang and Y.-C. Liang are with the Institute for Infocomm Research, A*STAR, 1 Fusionopolis Way, #21-01 Connexis, 
Singapore 138632 (Email: {rzhang, ycliang}@i2r.a-star.edu.sg). 

X. Wang is with the Department of Electrical Engineering, Columbia University, New York, USA, (Email: 
wangx@ee.columbia.edu). 

Part of this paper will be presented at IEEE ISIT, June 28-July 3 2009, Seoul, Korea. 



May 10, 2009 



DRAFT 



2 

I. Introduction 

The original idea of cognitive radio (CR) envisions that the CR opportunistically accesses 
the frequency bands allocated to the licensed primary radio (PR) system when the latter is not 
in operation [1]. In particular, the CR first detects the void frequency bands, also known as 
"frequency holes", and then transmits over them. The related key technique is called spectrum 
sensing, which has been thoroughly studied in the literatures over the recent years [2]-[5]. This 
opportunistic spectrum access (OSA) idea for the CR has been proven meaningful from the 
survey made by the Federal Communications Commission (FCC) [6], which reveals that the 
current utilization efficiency of the licensed radio spectrums could be as low as 15% on average. 
An alternative model for the operation of the CR other than OSA is known as spectrum sharing 
(SS) [7], for which the concurrent transmission of CR and PR in the same frequency band is 
permissible provided that the resultant interference power due the the CR transmission at each 
PR terminal, or the so-called interference temperature (IT), is kept below a predefined threshold. 

A new type of SS transmission scheme was recently proposed in [8], where multiple antennas 
are deployed at the CR transmitter (CR-Tx) to enable cognitive beamforming for regulating the 
resultant interference power levels at PR terminals. However, the scheme proposed in [8] requires 
perfect knowledge of all the channels from CR-Tx to PR terminals available at CR-Tx. This 
assumption is not realistic from a practical viewpoint since the PR is in general not responsible 
to facilitate the CR in obtaining such channel knowledge. Under the assumption of time-division 
duplex (TDD) transmission mode for the PR, a breakthrough was made later in [9], where a 
blind estimation approach is proposed for CR-Tx to obtain partial channel information from 
CR-Tx to PR terminals. Based on the estimated partial channel information, transmit cognitive 
beamforming is designed and is shown to be capable of directing CR's transmit signals only 
through the null space of the CR-PR channels and thus removing the interference to PR terminals. 
Unfortunately, this very initial effort made in [9] is still far from pushing this SS scheme into 
practical usage; for example, the channels between CR transceivers are assumed perfect and the 
interference from PR to CR terminals is ignored for the CR transmission design. 

In this work, we develop a more practical CR transmission strategy, where many issues that 
were not addressed in [9] are embraced. The main contributions are summarized as follows: 
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• The proposed CR transmission scheme consists of three major stages: environmental learn- 
ing, channel training, and data transmission. Note that this new scheme is more concrete 
as well as practical in comparison with that in the existing work [9]. 

• In addition to the transmit cognitive beamforming method studied in [9], we propose a 
new beamforming method at the CR receiver (CR-Rx) to mitigate the interference from 
the PR. More specifically, both CR-Tx and CR-Rx listen to the PR transmission during the 
environment learning stage and then design the transmit and receive beamforming to null 
the interference to and from the PR, respectively. 

• Instead of assuming perfect channel knowledge between CR-Tx and CR-Rx as in [9], we 
adopt a training stage for the CR to estimate the effective channel after applying joint 
transmit and receive beamforming. The optimal training structure is derived to minimize 
the channel estimation error, taking into account of the interferences to and from the PR. 

• We derive a lower bound on the ergodic capacity achievable for the CR in the data 



transmission stage, subject to a prescribed 



constraint at the PR, from which we observe 



a new learning/training/throughput tradeoff]}} associated with the proposed CR transmission 
scheme, pertinent to transmit power allocation between training and data transmission stages, 
as well as time allocation among learning, training, and data transmission stages. Moreover, 
we optimize the associated power and time allocation to maximize the derived lower bound 
of the CR ergodic capacity. 
The rest of this paper is organized as follows. Section II presents the system model of the 
multiple-antenna CR system. Section III formulates the CR learning, training, and transmission 
strategies. Section IV derives the lower bound on the CR ergodic capacity, and obtains the 
optimal power and time allocation among different stages to maximize this lower bound. Section 
V provides simulation results to corroborate the proposed studies. Finally, Section VI concludes 
the paper. 

Notations: Vectors and matrices are boldface small and capital letters, respectively; the trans- 
pose, complex conjugate, Hermitian, inverse, and pseudo-inverse of a matrix A are denoted by 

'This tradeoff is more general as well as of more practical relevance than the earlier proposed sensing- throughput tradeoff 
[10] and learning-throughput tradeoff [9] for OSA- and SS-based CR systems, respectively. 
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A T , A*, A H , A -1 , and A\ respectively; tr(A) and det(A) denote the trace and the determinant 
of the matrix A, respectively; diag{a} is a diagonal matrix whose diagonal elements are given by 
entries of the vector a; I denotes the identity matrix; and E[-] denotes the statistical expectation. 

II. System Model 

We consider a CR system with Mi antennas at terminal CR-T1 and M 2 antennas at terminal 
CR-T2.|f] as shown in Fig. [Q At the same operating frequency band, there exists a PR link with 
two terminals PR-T1 and PR-T2. We assume a time-division-duplex (TDD) mode for both PR 
and CR links. Specifically, the transmitting of PR-T1 occupies an average proportion a of the 
overall period, while its receiving occupies the other (I — a) of the overall period. For simplicity, 
we assume that PR-T2 stays outside the CR's transmission boundary, as shown in Fig. [Q 
Nevertheless, all the following discussions can be straightforwardly extended to considering 
both PR-T1 and PR-T2 inside the CR's boundary by utilizing the effective interference channel 
concept proposed in [9]. We then denote the number of antennas at PR-T1 as M p and replace 
PR-T1 by PR for notational brevity. 

Let the channels from PR to CR-T1 and CR-T2 be represented by the Mi x M p matrix Gi and 
the M 2 x M p matrix G2, respectively. The channel from CR-T1 to CR-T2 is denoted by the M 2 x 
Mi matrix H. Each element of all the channels involved is assumed to be independent circularly 
symmetric complex Gaussian (CSCG) random variable with zero mean and unit variance. Since 
both PR and CR operate in a TDD mode, the channel reciprocity principle is justifiable and 
thus the reverse channels from CR-T1 to PR, from CR-T2 to PR, and from CR-T2 to CR-T1 
are assumed to be Gf , G|\ and H T , respectively. Furthermore, we require more antennas at 
CRs than at PR, i.e., Mj > M p for j = 1, 2, in order to enable the environment learning method 
discussed later in this paper. This requirement on the number of CR's antennas is a reasonable 
cost for the CR to realize the concurrent transmission with PR. 

III. CR Transmission Strategy 

As shown in Fig. |2j the CR transmission is divided into consecutive frames, each having 
a duration of iV symbol periods. Each frame is further divided into three consecutive stages: 

2 We do not specify CR-Tx or CR-Rx because both CR terminals transmit and receive alternately in a TDD mode. 
May 10, 2009 DRAFT 



environment learning, channel training, and data transmission with durations of N h N t , and 
Nd symbol periods, respectively. Obviously, there is iVj + N t + = N. In the environment 
learning stage, CR-T1 and CR-T2 gain partial knowledge on Gi and G 2 via listening to the PR's 
transmission. Since this knowledge is obtained in a passive manner, we describe it with the term 
"learning". In contrast, in the second channel training stage, the CR transmitter actively sends 
out training signals for the receiver to estimate the channel between CR-T1 and CR-T2, and 
thus, this process is described by the term "training". During the last data transmission stage, 
CR-T1 and CR-T2 transmit in an alternate manner. Note that the value of N is chosen to be, 
on one hand, sufficiently smaller than the channel coherence time such that all the channels can 
be safely assumed to be constant within each frame, and on the other hand, as large as possible 
in order to save the overall throughput loss due to learning and training overheads. 

A. Environment Learning Stage 

Considering that PR switches between transmitting and receiving, signals sent from PR can 
be expressed as 

(sJn) if PR transmits 
n=l...,N, (1) 
otherwise 

where s p (n)'s are independent and identically distributed (i.i.d.) random signals with covari- 
ance matrix ofl. Then, the average covariance matrix over the entire time period is R p = 
E[sp(n)sf (n)] = aa% 

The signals received at CR-T1 and CR-T2 during the learning stage are then 

Yj{n) = GjS p (n) + Zj(n), n = 1, . . . , N h (2) 

for j = 1,2, where Zj(n) is the independent CSCG noise vector with zero means and the 
covariance matrix aiX. 

ftf J 

1 ) Ideal Case: The covariance matrices of the received signals at CRs can be expressed as 

R, = E[ yj (n)yf (n)} = aa 2 s G,Gf +a 2 nj I, (3) 

where Qj is defined correspondingly. The eigen-value decomposition (EVD) of Rj is 

R, = VjE/Vf + <U,Uf , j = 1, 2, (4) 
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where Vj is the Mj x M p signal subspace matrix and Uj is the Mj x (Mj — M p ) noise subspace 
matrix. Correspondingly, Xj is the diagonal matrix that contains the largest M p eigenvalues of 
% It is easy to verify that Uf Gj = and Vj(£j - a 2 nj l)Vf = Q r 

The channel Gj is related to Vj by Gj = VjBj, where Bj is an unknown M p x M p matrix. 
Fortunately, knowing Vj and XJj is sufficient to design the cognitive transmit beamforming [9]. 
That is, CR terminals transmit only through the space spanned by U*, thereby no interference 
is caused to PR because GjXJ* = 0. Therefore, the main task for CR-T1 (CR-T2) in the 
learning stage is to blindly estimate the noise subspace matrix Ui (U2) from the received signal 
covariance matrix, R x (R2). 

2) Practical Case: Given the finite number of samples received from PR, the sample covari- 
ance matrix for the received signals at each CR terminal is computed as 

1 Nl 

Ri = -jvEy;( n )yf( n )> ^' = 1 > 2 - (5) 

1 71=1 

The EVD of Rj is written as 

Kj^VjtjVf + U,fuf. (6) 

From [11], the first-order perturbation of the noise subspace due to the finite received samples 
can be approximated by 

AVj = Vj - Vj w -QJARj-Uj, (7) 

where ARj = Rj — Rj. 

B. Data Transmission Stage 

Before we make discussions for the channel training stage, we need to first recognize the 
required channels for data detection at both CR terminals. Thus, we bring forward the discussions 
for the data transmission stage here. 

Suppose that on average CR-T1 transmits over 9N d symbol periods whose indices belong to 
the set Adi and CR-T2 transmits over the remaining (1 — 9)Nd symbol periods whose indices 
belong to the set M^, where 9 < 1 is a prescribed constant. Note that Mai U-^2 = {Ni + N t + 
1, Ni + Nt + 2 . . . ,N-1, N} and A^i f)Af d 2 = 0- Denote the encoded signal vector from CR-T1 
and CR-T2 at symbol period n as di{n) and d 2 (n), respectively. We look into the following 
two cases: 
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1) Ideal Case: To protect PR, dj(n) is precoded by U* from the earlier introduced cognitive 
transmit beamforming. The received signals at CR-T1 and CR-T2 are 

yi(n) = H T U;d 2 (n) + GiS p (n) + Zl (n), n e Af d2 , (8a) 

y 2 (n) = HUtd 1 (n) + G 2 Sp(n) + z 2 (n), n<Ej\f dl , (8b) 

respectively. Note that for the CR system, not only the interference from CR to PR, but also 
that from PR to CR needs to be handled, where the latter case is not considered in [9]. From 
(f8~b~l ), we know CR-T1 needs HUJ and R 2 to determine the optimal transmit co variance matrix 
for di(n) [12]; and from (l8al) we know CR-T1 needs H T U 2 and Ri to decode the signal from 
CR-T2. Similar discussions hold for CR-T2. 

If we work on the model ([8]) directly and train the channel, then CR-T1 can only estimate 
H T U^, while CR-T2 can only estimate HU[. The knowledge of HUJ and R 2 have to be fed 
back from CR-T2 to CR-T1, and the knowledge of H T U 2 and Ri have to be fed back from 
CR-T1 to CR-T2. To release the burden of both channel estimation and feedback^ we propose 
to use cognitive receive beamforming at both CR terminals, i.e., CR-T1 and CR-T2 left-multiply 
the received signals by Uf and Uf, and obtain 

yx (n) = Uf H r U;d 2 (n) + Uf Zi (n) = F T d 2 (n) + z x (n) , (9a) 

y 2 (n) = Uf HUJd^n) + Uf z 2 (n) = Fdi(n) + z 2 (n), (9b) 

respectively, where F and Zj(n), j — 1, 2 represent the equivalent channel and noise, respectively. 
Some observations are made here: 

• The equivalent channels between CRs become reciprocal, which offers advantages as 

- We can estimate the channel at one CR terminal only and then feed it back to the other 
terminal, which reduces the burden of feedback; 

- We can estimate the channel at both CR terminals and eliminate the necessity of the 
channel feedback. 

• The interference from PR is completely removed at both CR terminals. 

• The resultant noise zj(n) is still white Gaussian. 

3 Note that, the bandwidth of CR feedback channel is also limited since CR is unlicensed user and could not have much 
bandwidth. 
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2) Practical Case: With finite learning time, only the estimates U/s can be obtained. After 
applying the proposed cognitive beamforming, the two CR terminals receive 

yi (n) = F T d 2 (n) + AUf G 1 s. p (n) + z^n), (10a) 
y 2 (n) = Fdi(n) + AUf G 2 s p (n) + z 2 (n), (10b) 

where F and Zj(n) are now redefined as U^HU^ and Ufzi(n), respectively. 

Remark 3.1: With imperfect learning, the channel is still reciprocal and the noise distribution 
is the same as the perfect learning case. However, there exist residue interferences at the CR 
receivers caused by PR. Although the interference statistics need to be fed back from one CR 
terminal to the other for designing the transmit signal covariance, we will later see that in 
fact only little feedback is needed due to the special structure of AXJ^GjS p (n). Therefore, the 
advantages in the perfect learning case are mostly maintained even with imperfect learning. 

To obtain some essential insights for the optimal design, we will focus on the simplest case 
in the sequel by setting 6 = 1, i.e., transmission only takes place from CR-T1 to CR-T2. The 
discussion for a general value of 6 can be made based on a similar approach but is rather 
omitted here for brevityo The covariance matrix of the residue interference AU^G2S p (n) can 
be expressed as 

E 2 = E[AUf G 2 s p (n)sf (n)G? AU 2 ] = E[AUf Q 2 AU 2 ] (11) 
From [13, Eq. (30)] and the fact that AR 2 = ARf , we know 

E[AR 2 *AR 2 ] = — *r(*R 2 )R 2 , (12) 
for any matrix ^. Then, we have 

E 2 =E[Uf AR 2 Q 2 Q 2 Q 2 AR 2 U 2 ] = ^%^ UfR 2 U 2 

A; 

(q) tr(Q 2 R 2 ) 2 _ tr(QlQ 2 )+a 2 n2 tr(Ql) 2 
- Ni ^l- Ni ^2* 

^ 2 (M p + ^ 2 tr(Q 2 )) ft 

A; A Z ' 

4 Discussing over the general case of 8 requires a more complex mathematical derivations which could, at least, be carried 
out from the brute-force searching method. However, such an approach would hinder the clear exposition of our learning based 
cognitive radio scheme and will not to be the focus of this paper. 
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where "(a)" uses the property that U^Q 2 = 0, and /3 2 is defined accordingly. 

Remark 3.2: Interestingly, the interferences at all antennas are uncorrelated and have the same 
power, /3 2 . To assist the source covariance design at CR-T1, only a scalar (3 2 needs to be sent 
back from CR-T2, which is much easier than feeding back the whole covariance matrix R 2 . 
This explains our previous claim in Remark 13.11 that only a little amount of feedback is needed 
due to the residue interference from PR. 

Remark 3.3: Computing (3 2 needs some tricks. Since the exact value of Q 2 is not available 
at CR-T2, we may replace Q 2 by its ML estimate Q 2 that can be obtained from R 2 according 
to the algorithms in [9]. 

Another impact of imperfect channel learning is the CR's residual interference to PR, which 
is normally characterized by the IT defined as the total interference power at PR [8] expressed 
as, e.g., for CR-T1, 

I dl = EtllGfU^HH 2 ] = E[\\G* AU^dxHH 2 ]. (14) 

Although a more accurate characterization should be the performance loss at PR due to the 
interference [14], such kind of technique requires certain cooperation between the CR and PR. 
Nonetheless, IT has been proved effective to upper bound the capacity loss at PR [8], [14]. 

Define R dl = E[d 1 (n)df f (n)] as the transmit covariance matrix of CR-T1. It can be further 
shown that 

= a " ltr ^ Rdl) (MGfQjGO + a>(GfQtQt Gl )) 

ao-gNi aa 2 Ni 
where "(a)" comes from Uf Qi = 0, and f3\ is defined as the corresponding term. An important 
observation is that the IT is inversely proportional to the learning time iVj. 

Example 3.1: Consider a CR system with parameters M p = 2, a = 0.5, Mi = M 2 = 4, 
N = 1000, and o 2 nl = 1. We numerically examine the theoretical expression of the IT for 
of = dB and a 2 s = 20 dB, respectively. The ML estimate Qi is used to compute j3i for 
different values of N L . Totally 10, 000 Monte-Carlo runs are taken for averaging. The figure of 
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merit is the inverse of the normalized IT l/(a 2 I dl ). As shown in Fig. [3l the numerical and 
theoretical results match each other quite well. The higher value of a 2 yields lower value of IT 
due to the smaller p\. 

Suppose the acceptable IT at PR is no more than Q. Then, the source co variance design at 
CR-T1 should take care of the following constraint: 

tr(R dl ) < = XiNu (16) 

Pi 

where \x 1S defined as xi — C a(j2 s / Pi- 

Remark 3.4: Note that, the parameter (aa 2 should be obtained by CR via some dedicated 
means. For example, PR could report to a central controller about this single parameter from 
time to time, and CR could directly obtain this parameter from the central controller. However, 
CR does not need to know the instant status of PR as transmitting or receiving. 

Remark 3.5: From ([Tot , CR-T1 needs to know p\ before designing the system parameters, 
Ni, N t , and Nd- However, computation of p\, similarly as shown in Remark l3~3l depends on 
Qi, which is only available after the learning stage. This looks like a chicken-egg problem. 
Fortunately, it can be shown that f3\ varies negligibly when JVj becomes large. From the first- 
order perturbation analysis in ©, we know Q 1 — Qi is of the order Hence, tr(Q±) = 
tr(Q\) + 0(^=) does not vary much when JVj is large and will finally converge to tr(Q\). For 
practical implementation, we may let CR-T1 dynamically learn the channel, and at the same time 
check whether p 1 becomes a relatively stable value. Suppose p\ is relatively stable when CR 
learns the channel for iVo symbol periods. Then, CR-T1 can compute the optimized parameter 
Ni according to the algorithms given in the next section. If the optimal JVj is smaller than N , 
then CR-T1 immediately proceeds to the channel training stage; otherwise, CR-T1 will keep on 
learning for another iVj — N symbol periods. Therefore, in the design of the system parameters, 
there is no harm to treat f}\ as a known constant factor, which also makes xi a constant value. 

Example 3.2: We consider the same system setup as Example 13.11 and examine the variation 
of Pi with respect to the learning time iVj. Both the theoretical and numerical values of p\ 
are shown in Fig. @] where the former is obtained from the true matrix Q 1 and the latter is 
obtained via Qi. It is seen that there always exists some value of N , beyond which p\ becomes 
a relatively stable value. For example, with PR transmit SNR a 2 = dB, taking iV = 200 can 
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guarantee a stable Pi, while for a higher SNR a 2 s = 20 dB, taking N as small as 10 is sufficient 
to yield a stable f3\. 

C. Channel Training Stage 

The targets of channel estimation for the CR link in the channel training stage are F at CR-T2 
and F T at CR-T1. Thanks to the proposed transmit and receive cognitive beamforming, which 
yields a pair of reciprocal channels, we may train the channel from both directions and thereby 
eliminate any feedback, or train the channel from one direction only and then feed back the 
result from one CR terminal to the other. In this paper, we will adopt the second approach to 
gain tractable and insightful analysis, whereas considering the first approach does not change 
the basic principle but complicates the discussions. 

Without loss of generality, we assume Mi < M 2 and let CR-T1 send the training sequence to 
CR-T2. To protect PR, the training signal from CR-T1, denoted by ti(n), must also be precoded 
by the matrix Uj\ The received signal at CR-T2, after beamforming, is then given by 

y 2 (n) = Ft 1 (n) + AUfG 2 s p (n) + z 2 (n), N t + 1 < n < N t + N t . (17) 

Denote 



Y 2 = 


[y 2 (iV/ + i) 


,y 2 (iV/ + 2),. 


..,f 2 (Ni + N t ) 


Ti = 


MNi + l) 


,ti(iVi + 2),.. 


.MNi + N t )] 


Sp = 


[s P {Ni + 1) 


,s p (ivl + 2),.. 


.^(Nt + Nt)] 


z 2 = 


[z 2 (iV, + 1) 


MNi + 2),.. 





The covariance matrix of F is then computed as as 

Rf = E[F H F] = E[Uf H H U 2 Uf HUJ] = K 2 I. (18) 

where Kj = Mj — M p , j = 1,2, for notation simplicity. 

The linear-minimum-mean-square-error (LMMSE) -based channel estimator for F can be 
obtained as [15] 

F = Y 2 (Tf R F Tx + E[Sf Gf AU 2 AUf G 2 S P ] + o$ 2 K 2 T)- 1 T? K F , (19) 



May 10, 2009 



DRAFT 



12 



and E[S^G^AU 2 AU^G 2 S P ] is separately computed as 

E[Sf Gf AU 2 AU? G 2 S P ] = ^1, (20) 

where we use the property that s p (n)'s are temporarily and spatially independent. Substituting 
(120b and (HU into (Ti9i we obtain 



F = Y a (TfT 1 +('| + ^lV 1 Tf > (2D 



72 

where 7 2 is defined accordingly. Let AF = F — F. From the nature of the LMMSE estimation, 
AF is uncorrected with F. The rows of AF are uncorrected among themselves and each has 
the covariance 

Ra/= (l+^Tfj . (22) 

Moreover, the covariance matrix of each row of F can be calculated as 

R ; = Tx(Tf Ti + 72 I)^ 1 Tf = I - R A/ . (23) 

Assuming s p (n) to be Gaussian random variables, the entries of F and AF are easily seen to 
be Gaussian distributed for a given G 2 . 

Due to imperfect learning, the residue interference Gf AU*ti(n) is non-zero at PR. The IT 
caused during training is computed as 



7 tl (n) = EfHGfU^WlH = l|tl( y . (24) 



In fact, it is not possible to restrict the instant interference 1*1 (n) at time slot n. Therefore, we 
will deal with the average interference during the entire training stage, defined as 

h - = N t 2. W = aff 2 NlNt • (25) 

f n=N l +l s 1 1 

The IT constraint is then I tav < £, which is equivalent to 

tr(TiTf ) < xiTO. (26) 
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IV. CR Transmission Optimization 

The tradeoff of power and time allocation between channel training and data transmission has 
been studied in, e.g., [16] for the traditional multi-antenna system. However for the proposed 
CR scheme, an additional time period should be assigned for learning. Intuitively, one would 
expect the larger iVj to get better space estimation such that both the interferences to and from 
PR can be reduced via cognitive beamforming. However, increasing iVj will decrease N d for 
fixed Nt and N, and thus reduce the overall system throughput. Meanwhile, the IT constraints 
during both training and data transmission should be taken into consideration, which bereaves 
the freedom of the power allocation. All the above issues make the pertinent analysis for the 
CR system a non-trivial one as compared to the existing results in [16]. 

Similar to [16], we will evaluate the performance of the proposed CR scheme considering the 
lower bound on the system ergodic capacity, which is related to both channel estimation errors 
and residue interferences to and from PR. Based on this lower bound, the optimal power and time 
allocation over CR's learning, training, and data transmission stages are derived, which provides 
some insightful guidance for the practical system design. We assume an error-free feedback 
channel from CR-T2 to CR-T1. The effect of imperfect feedback on the achievable transmission 
rate has been partly discussed in [17], [18]. As mentioned before, we only focus on the case of 
9 = 1, i.e., CR-T1 transmitting to CR-T2 in the entire data transmission stage. 

Assume the total power that can be allocated to CR-T1 over one frame is P, and denote 
the average powers during training and data transmission over all Mi antennas as p t and p d , 
respectively; namely 

E{||UJTi|||.} = ir(TiTf ) = p t N tl E{||U*d 1 (n)|| 2 } = tr(R dl ) = p d , (27) 

where || ■ \\p denotes the Frobenius norm. Note that the precoding matrix should be taken 
into account when we compute the power over transmit antennas. 
Conservation of time and power yields 

N = N l + N t + N d , P> p t N t + p d N d . (28) 

Note that ">" is used in the power allocation constraint to account for the cases when P cannot 
be fully utilized due to the IT constraints at PR. 
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A. Lower Bound on CR Ergodic Capacity 

During the data transmission stage, the received signal at CR-T2 can be rewritten as 

y 2 (n) = Pdi(n) + AFdi(n) + AUfG 2 s p (n) + z 2 (n), N t + N t + 1 < n < N, (29) 

V v ' 

V2(n) 

where v 2 (n) is defined as the effective interference-plus-noise term. The covariance matrix of 
the second term on the right-hand side (RHS) of (1291) can be computed as 

E[AFdx(n)df {n)AF H \ = tr(R dl R A/ )I, (30) 

where the uncorrelation among rows of AF is utilized. Therefore, v 2 (n) has the covariance 

R, 2 = (tr(R dl R A/ )+ 72 )I. (31) 

Note that v 2 (n) is uncorrelated with the signal part Fdi(n); however, it is not necessarily 
independent with the signal part. 

Since the channel is memoryless, the instantaneous mutual information (IMI) between the 
unknown data and the observed values at CR-T2 is 

Z(y 2 (n), Y 2 , Ti; d^n)) >J(y 2 (n), F; di(n)) 

=J(y 2 (n);d 1 (n)|F)+X(F;d 1 (n)), N t + N t + 1 < n < N. (32) 

=o 

Lemma 4.1: With instant knowledge of channel F, the ergodic capacity of CR channel is 
lower-bounded by 



C > Cxi = max Ep 

Ti 



max log 1 1 + R^ 2 1 FR d iF H | 

R-dl 



(33) 



s.t. tr(R dl ) =p d < xiNi, tr(TiTf ) = p t N t < xi^N, 



t- 



where E[-] is taken over F, and the two constraints are due to the IT constraints (fT6l) and (l26l) . 
respectively. 

Proof: See Appendix I. ■ 
Note that in (|33~T) . R^i should be maximized inside E[-] because CR-T1 knows F instantaneously. 
However, training sequence should be fixed for all the channel realizations, and thus T x is placed 
outside EM. 



May 10, 2009 



DRAFT 



15 



B. Optimizing Training Sequence 

Due to the difficulty of computing the optimal T x from (1331 ), we will design training sequence 
based on a different criterion of minimizing the channel estimation mean square error (MSE), 
i.e., tr(R.A/), which is a practically adopted method for channel estimation [19], [20]0 The 
similar approach has also been suggested in [16], [21], [22] from different viewpoints. Hence, 
the optimal training design is found from the problem 

min tr(I + — TiTf)" 1 (34) 

Ti 72 



s.t. tr(TxTf ) = p t N t 



where we leave the IT constraint p t < Xi^i m the later optimization. By applying the geometric- 
arithmetic mean inequality, the optimal TiTf^ can be easily calculated as ^f^I and the corre- 
sponding R A / is 

RA / = AT 1 = »» L (35) 

72^1 + p t N t 

C. Optimization Over Source Covariance 

With the separately designed training, a new lower bound of the ergodic channel capacity is 



written as 



Cl2 = Ep 



max log 



r] 2 tr(R dl ) +72 



(36) 



and the constraint is tr(R dl ) = p d , where we leave the IT constraint p d < Xi^i m the later 
optimization. 

Define F w = FRJ 1 ^ 2 = (1 — t^)^ 1 / 2 ! 1 as a row-whitened version of F. Since the entries of F w 
are random Gaussian variables with zero means and unit variances, the distribution of F w is not 
related to the system parameters, p t ,p d ,N h N t , and N d . Let the EVD of (F W ) H F W be QAQ H , 
where Q is an unitary matrix and A = diag{Ai, A 2 , . . . , AftTxlJj with Aj's being arranged in a 

5 Considering MSE-based channel estimation does not deteriorate the main merit of the proposed study since we aim to provide 
a practical design. 

6 Recall that we have assumed that K\ < Ki- 
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non-increasing order. The distributions of Aj's are not related to the system parameters, either. 
Define X = ^ ~ ^ Q H R rfl Q . Then, the capacity lower bound is rewritten as 



Cl2 =Ea; 



max log 1 1 + p c fjXA| 

X:tr(X)=Pea 
X>=0 



(37) 



where 



ff = Pd(l -m) = pdptNt 

»fePd + 72 l2{pdKi + 72^1 + PtiVt) 

is defined as the effective signal-to-noise ratio (SNR). It is easily known that the optimal X 
possesses a diagonal structure X = diag{xi, x 2 , . . . , x^}, whose value is found from the 
standard water-filling algorithm [25] as 

Xi=(fi-^j , (39) 

where (■)+ denotes max(-, 0), and p represents the water-level chosen to satisfy Ylf=i x i = Pes- 
Define q k = - Y^=i for k = 1, . . . ,K± — 1, q — 0, and q Kl = +oo. Then, C L2 is 
expressed as C L2 = E A . [g(p e g, Xi)}, where g(p e g, Aj) is a segment function: 

g(p cS , Xi) = ^2 lo S ^P<& + y^j j ' P* s e (^fc- 1 ' ?*]■ ( 4 °) 

Lemma 4.2: For given Aj's, g(p c g, Xi) is a continuous, differentiable, increasing, and concave 
function of p e g. 

Proof: See [9]. ■ 
Corollary 4.1: Cl 2 is a continuous, differentiable, increasing, and concave function of p eff . 
Proof: Apply Lemma |4~21 and the property that the distributions of Aj's are independent 
from p eff . ■ 

D. Optimization Over Power Allocation 

Averaged over the entire CR frame, the lower bound on the ergodic capacity becomes 

Cal = ^fC L2 , (41) 

where N d /N accounts for the fact that the data transmission occupies an interval of Nd symbols. 
Cal in (BP is a function of different system parameters, p d , p t , Ni, N t , and Nd, whose optimal 
values should be obtained by maximizing Cal- From now on, we will virtually consider Ni, N t , 
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and N d as continuous variables. It needs to be mentioned that N t should be no less than K 1 in 
order to obtain a meaningful channel estimation, while Nd must be no less than 1 in order to 
achieve a meaningful transmission. 

Since Cl2 is an increasing function of p c g, we can equivalently obtain the optimal p t and p d by 
maximizing p cff . Considering the IT constraints and the total power constraint, this optimization 
problem is expressed as 

max p cff (42a) 

pt,pd 

s-t. p t <XiNi, (42b) 
Pd < XiNi, (42c) 
p t N t + p d N d <P, (42d) 



for given jVj and N t . The optimization problem (|42j) is non-convex, while we will, in the 
following, derive its closed-form solutions. 

By carefully observing the above three constraints, we find that if xiNi(N — JVj) < P, then 
(I42bl) and (I42cl) hold with equalities for the optimal solution, because p cff is an increasing function 
of both pd and p t . Otherwise, the equality in (I42dl) must hold. Define % as the set of N\ with 
XiN(N — N[) < P (the explicit expression of % is omitted here). Obviously, % is a constant 
set that can be computed before the optimization. Based on the above discussion, we consider 
the following two cases: 

Case 1: N E % : In this case, the optimal power allocation is p\ — p\ — XiN- The effective 
SNR is 

S1V o* xlNfNt (43) 

HcS ^(xiNi^+^ + xiNNtY 

Case 2: iVj ^ 7^: In this case, p cS becomes 

Pd(P-p d Nd) _ (P-p t N t ) Pt N t 

PcS l2{P + l*K x -p d {N d -K x )) l2 ((P - Pt N t )K l + l2 K 1 N d + Pt N t y 

To proceed, we first ignore the constraints (I42bl) and (I42cl) . and denote the solutions that maximize 

(|44|) as p' d and p^, respectively. Define c = ^p^ypr for 7^ i^i. Following the similar 

approach in [16], we know (1441) has only one valid root of p' d in the region [0,P/N d ] (or one 
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valid root of p' t in the region [0, P/N t ]), and the solutions are expressed as 



, P 



N d > K x 

\ N d = K, , (45) 



k c + y/c(c-l) N d < K x 



The corresponding effective SNR is obtained as 



A = (46) 



Peflf 



72 (N d 

p2 



^(v^-v^T) 2 N d >K 1 

N d = K 1 ■ (47) 



472-^1(^+72-^1) 



p f./—^ ./1 ^2 



c-v / T^) 2 N i <K 1 



72(^i-AT d )' 

The following lemma is very important for the later discussions. 

Lemma 4.3: For a given TV/, p' d is an increasing (decreasing) function of N t (N d ), while p' t is 
a decreasing (increasing) function of N t (N d ). 

Proof: See Appendix HB ■ 
Define P/ = p' t N t and P^ = p' d A^ as the corresponding powers allocated to training and data 
transmission. 

Lemma 4.4: For a given iVj, P^ is a decreasing (increasing) function of iV t (AT d ), while P' t is 
a decreasing (increasing) function of N t (N d ). 

Proof: The proof follows the similar method given in Appendix HH ■ 
Now let us include back the constraints (I42bl) and (I42cl) to derive the true optimal solutions 
of (|42|) . There exist the following three subcases: 

1) Pt > Xi^-' Since (1441) has only one valid root p^, the optimal p t considering (I42bl) must 
stay on the boundary, which gives p\ = Xi^i- Then, the optimal p d is directly computed 
as p* d = (P — xiNiN t ) /N d . The corresponding effective SNR is 

S2V (P = K N Mn N * N * m) 

HeS -y2((P-XiNiN t )K 1 + >y 2 K 1 N d + xiNiN t N d y 

Since p£ is a decreasing function of N t , the region of JVj for this subcase can be represented 

by %i(Ni) = [Kx, A r 1 ], where Ni can be computed from (1431) as the value of N t that makes 

Pt = X1N1. 
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2) p' d > XiNi-' Similar to the previous subcase, we obtain p* d = Xi^i an d p* = (P — 
X\NiN d )/N t . The optimal effective SNR is 

S3V {P - (49) 

Since p' d is a decreasing function of N d , the region of N d for this subcase can be represented 
by T d (N[) = [1, iV 2 ], where N 2 can be computed from (1451) as the value of that makes 
= XiiVj. Correspondingly, the range of in this subcase is denoted by T t2 {N{) = 
[N — Ni — N 2 , N — Ni — 1]. 

3) Otherwise, p* = p' t , p* d = p' d , and neither (|42b|) nor (|42c|) holds in equality. The range 
of iVt in this subcase is immediately obtained as %^{N{) = [Ni, N — Ni — N 2 ], and the 
corresponding effective SNR is 

S4): p* eS = p' cS . (50) 

Fig. [5] is quite helpful for understanding where the subcases S2), S3), and S4) take place. 

Example 4.1: The same system setup as Example 13.11 is used here. Two new parameters are 
introduced as P = 20,000 and Xi — 0.16. The optimal p* and p* d versus N t at Ni = 200 are 
shown in Fig. |6] The following observations are made: 

• p* is constant over S2) since p' d is bounded by Xi^u P* ls decreasing over S4) since it is 
equivalent to p' t , which is a decreasing function of N t from Lemma 14.31 p\ is increasing 
over S3) as is seen from p* = (P — xiNiN d )/N t ; 

• p* d is decreasing over S2) as is seen from p* d = (P — xiNiN t ) / N d ; p* d is increasing over S4) 
since it is equivalent to p' d ; p* d is constant over S3) since it is bounded by Xi-^i- 

Define P* = p* t N t and P d = p* d N d as the powers allocated to training and data transmission. 
We plot P* and P d versus N t in Fig. [71 From the observations in Fig. [6l we know that for 
subcases S2) and S3), P d is a decreasing function of N t , while P£ is an increasing function of 
N t . Furthermore, since P* = P[ and P d = P' d over S4), from Lemma l4~4l we know that the 
increasing property of P£ and the decreasing property of P d are kept over S4), too. 
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E. Optimization Over Time Allocation 

Substituting the closed-form expression of p c g back to Cal, we then formulate the optimization 
over the remaining variables, iVj and N t , as 

max Cal (51) 

N h N t 

s.t. N t >K 1: N d = N-N l -N t >\. 

The discussion is divided into four parts, corresponding to four subcases SI) to S4) in the 
previous subsection: 

Subcase SI): From (|43l) , it can be readily checked that > 0, and -^a < 0, for a given N t . 
Since Cl 2 is an increasing concave function of p* ff , there is 

dC L2 dC L2 dp* eG 



dN t dp* eS dN t 
d 2 C L2 d 2 C L2 fdptA 2 , dC L2 d 2 p* eS 



(52) 

< 0. (53) 



d 2 N t d 2 p* s V dN t J dp* s d 2 N t 
Therefore, Cl2 is an increasing concave function of N t . Since N ~ N ^~ Nt j s a linearly decreasing 
function in N t , by chain rule we know Cal is concave in Nt. Therefore, for a given Ni £ %, 
the efficient convex optimization tools can be applied to find N t . 

Subcase S2): For this subcase, there is no direct clue so we propose a one dimensional search 
over N t G T tl {N{). 

Subcase S4): We provide the following lemma for this subcase: 

Lemma 4.5: Cal is a decreasing (increasing) function of N t (N d ) over the region N t G 

Proof: See Appendix Hill ■ 
Therefore, we should reduce N t as much as possible if subcase S4) takes place. So the optimal 
N t in this case is simply N±. 

Subcase S3): We provide the following lemma for this subcase: 

Lemma 4.6: Cal over N t E T t2 (Ni) is smaller than that over N t E T a (Ni). 

Proof: Consider the optimization over N t E % 2 {Ni) but without the IT constraint (I42bl) and 
(I42cl) . Then, p' d and p' t become the optimal power values over N t E T t2 (N{). Similarly as subcase 



May 10, 2009 



DRAFT 



21 

S4), the resultant optimal capacity lower bound, denoted as C' AL , is a decreasing function of 
N t . Since region T t3 (Ni) is on the right side of %^,{Ni), as shown in Fig. [5] we know C' AL over 
T t3 (Ni) is smaller than Cal over T t2 (Ni). Adding the interference constraint back, we know the 
true optimal Cal over 7^ 2 (iV;) must be smaller than C' AL , which must also be smaller than Cal 
over TaiNi). U 
Based on the above discussions, the optimal time allocation is found from the following rules: 

• One dimensional searching of iVj is applied. 

- For any iVj E Ti, N t can be efficiently found from the convex optimization tools. 

- For any Ni ^ %, only N t in region S2) needs to be checked. In fact, as shown in Fig. [6] 
the set %i(N[) is usually of small size. 

V. Simulation Results 

In this section, we numerically examine the proposed study using various examples. The 
system as well as the parameters are the same as those in Example 14.11 We assume that the 
transmit power of PR is o 2 s = 20 dB, so iV = 10 can guarantee a very good estimate of (3j, 
J = 1,2. 

1) Cal as a function of Ni and N t : In the first example, we take xi — 0.16 and plot Cal as 
a function of Ni and N t in Fig. [8j It is seen that the shape of Cal looks like a tent over the 
three-dimensional space, and there is a unique peak, where Cal is maximized. Then, we have 
the following conjecture that remains to be proved. 

Conjecture 1: Cal is a joint concave function of N t and N t . 

2) Optimal N[ and N t as a function of Xi- Besides introducing one more parameter N t , the 
effect of IT is another difference between our proposed work and that in [16]. In this sense, it is 
of interest to take a look at how the optimal time allocation is affected by the IT requirement. 
The values of optimal Ni and N t , denoted as N* and N* respectively, versus xi are men shown 
in Fig. |9] We have the following observations: 

• N* is a decreasing function of xi- This is because that when higher IT can be tolerated at 
PR, less learning time could be used to save the learning overhead. 

• N* increases first and then decreases with the increasing of xi- The reason is that the optimal 
N* is not only a function of xi but is also affected by JVj. When xi is small, p t is likely to 
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be bounded by X\Ni as is seen from (I42bl) . Therefore, the total training power p t N t may 
not be sufficient for a small N t , so we have to increase the training time. However, once 
Xi gets larger, sufficient training power can be obtained from a very short training time so 
N t should be decreased to save the training overhead for data transmission. Finally, as is 
also seen in Fig. |9l N t reduces to its lower bound K 1 = 2 when xi — 10. 
3) The maximum Cal as a function ofxi- In this example, we would like to take a look at how 
the maximum capacity lower bound, denoted as C AL , varies with different IT power levels £. 
Since xi — ^jff- is a linear scaling of £, we instead examine xi an d the curve of C AL versus xi is 
displayed in Fig. [TOl To illustrate the effect of the optimal power allocation on the capacity bound, 
we also consider a new scenario where equal power allocation p d = p t = mm{%iiVj, Pj (N—Ni)} 
is adopted, and the corresponding optimal Cal is obtained by searching all the candidates of N 
and Nt. It is first seen that C AL is a non-decreasing function of Xu which is intuitively correct. 
However, when xi is too large* the IT constraints do not take any effect and the capacity bound 
C AL cannot be increased anymore. Moreover, the equal power allocation provides comparable 
capacity value as that of the optimal power allocation when xi is small. This is because that at 
lower xi, the optimal power allocation is roughly bounded by the IT as pd = pt = Xi^u which 
is the same as the equal power allocation. However, when xi is relatively larger, the equal power 
allocation becomes suboptimal. 

Since the equal-power allocation between training and data transmission can yield relatively 
good performances, we then demonstrate with this power allocation scheme the achievable rate 
of a practical modulation and coding scheme (MCS) with the discrete bit granularity A > 0. The 
well-known SNR "gap" approximation, denoted by T, is adopted, which measures the power 
required by the considered MCS in addition to the minimum power obtained from the standard 
capacity function to support a given decoding error probability [23]. Then, the optimal discrete 
bit loading algorithm [24] can be applied to obtain the achievable rate. For a practical MCS with 
A = 0.5 and T = 3 dB, the corresponding achievable rate is also included in Fig. \\0[ which 
demonstrates the usefulness of the proposed studies for the practical system design. 
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VI. Conclusion 

In this work, we studied the design of transmission for a multi-antenna CR link under spectrum 
sharing with a PR link. Our studies built up two major contributions. First, we proposed a 
concrete CR deployment strategy that consists of environment learning, channel training, and 
data transmission stages, where detailed formulations on these stages were provided. Second, 
by analyzing the system parameters, we developed the algorithms to find the optimal power 
and time allocation for different stages so as to maximize the lower bound on the CR ergodic 
capacity. Closed-form solution of power allocation was found for a given time allocation, while 
the optimal time allocation was found via a two-dimensional searching over a confined set. 

Appendix I 
Proof of Lemma |4~T1 

We drop the index n here for brevity. The EVII between the output y 2 and di conditioned on 
the channel estimate F is 

T(y 2 ; d^F) = MdxlF) - hid^y* F) = h(d x ) - ^(dx|y 2 , F). (54) 

A lower bound on the capacity is obtained by directly taking di as a Gaussian random vector. 
In this case, the differential entropy h(di) = log(|7reRdi|). By definition 

h(d 1 \F,y 2 ) = h(d 1 -f(y 2 )\F,y 2 ), (55) 

for any function /(•). Moreover, there is 

h(d 1 - /(y 2 )|F,y 2 ) < h(d 1 - /(y 2 )|F,y 2 ) < log(| 7 reCov(d 1 - /(y 2 )|F, y 2 )|), (56) 

where Cov(-) denotes the covariance matrix of a random vector. To achieve the tightest bound, 
we wish to find a function /(•), such that |Cov(d! — /(y 2 )|F, y 2 )| is minimized. Since it is hard 
to find such a function /(■), we will, instead, accept a linear function /(y2) = Ay 2 with which 
tr(Cov(d! — Ay 2 |F,y 2 )) is minimized. Therefore, A is the LMMSE estimator of di, given F 
and y 2 , i.e., 

A = R dl F H (FR dl F H + R^)" 1 , (57) 
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where the property that F and v 2 are uncorrelated is used thanks to the LMMSE estimation of 
F. Therefore, 

Covfdi - Ay 2 |F) = (R^ 1 + F^R^F)" 1 , (58) 

and a lower bound on the capacity is obtained as 

Z(y 2 ; di|F) > log |R d i(R^ + F^R^F)! = log(|I + R^FR^F" |). (59) 

This lower bound is achieved when the input signal d x is Gaussian and the effective noise v 2 
behaves as Gaussian. Taking the expectation over (l59l yields the lower bound on the ergodic 
capacity. Meanwhile, considering that the variables to be adjusted to maximize this lower bound 
are R^i and Ti, Lemma |4~T1 thus follows. 

Appendix II 
Proof of Lemma 03] 

We will prove that p' d is a decreasing function of N d for N d > K\ and omit the proofs for the 
other cases since they are quite straightforward. Define c = N ^ Kl and a = P+ ^ 2Kl • It suffices 
to prove that 



= -^-(c- y/c(c-a)) 



(60) 



is a decreasing function of N d . Bearing in mind the following properties: 

dc -K x 



dN d - { N d - Kl r 



we obtain 



9E - 1 c ,ftiT^)+ c *$- & l 2 A Kl 



Nl [ 5 ~ ^ ~ + ( 5 ~ - »)) N^Kl + f V c - a N d K - K l 



iV^ a / c i^i 



^^i^-lfMM 1 ' <61) 



Since N d > K±, we only need to prove 



~cN d > ^ J A + y/c(c-a)N d (62) 
2 V c-a 
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or equivalently 



N d > = K A } , (63) 

4 (^f + 1 ) 



which is quite obvious since -^f"^ > 0. 

Appendix III 
Proof of Lemma 03] 

We will only examine the case Nd > K\, whereas other cases can be handled similarly. First, 
differentiating p* ff with respect to N d gives 



dp* a _ p(y-c-y^T) 2 ( k^c \_ P ; ff / (p + 12 k 1 )k 1 \ 

dN d l2 (N d - Kt) 2 \N d V^T J N d -KAV (P + l2 N d )N d 



(64) 



From (1331) , we only need to prove that Vt = jfg(p cS , A*) is an increasing function of N d . The 



differentiation of Q with respect to N d is given by the segment function 



aSi_l/i kpl„ N d ( , /(P + 72 A' 1 )A' 1 



V (P + 72N d )N d 

Pes e (gjfc-i,g*]. 

Since iV d > A^, there is 



(65) 



JV d -#i 1 V (P + l2N d )N d 



(66) 



It needs to prove that the segment function 

k 



™(P*es) = ^(Peff.Ai) " / : kf ^ k — , P* eS G (67) 

i=l 



is nonnegative for p c g > 0. 



For the kth segment, i.e., p* ff G [qk-i,Qk]> there is 

fe-i , / fc-i . \ fe-i 



By letting x = * A k and using the inequality log(l + x) — > for x > 0, we know 
w(qk-i) > 0. The differentiation of w(p* s ) in the kth segment is 



dw(p* s ) _ kp* s 



> 0. (69) 



Therefore, f2 is an increasing function of N d over all segments. 
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Fig. 1. System model for the multi-antenna CR system. 
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Fig. 2. CR frame structure. 
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Fig. 4. The value of j3\ versus environment learning time. 
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Fig. 5. The illustration of regions S2), S3), and S4), when Ni £ %. 
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Fig. 6. The optimal powers p\ and p* d versus Nt for Ni £Ti. 
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Fig. 7. The optimal total training power P* and data transmission power P*[ versus N t for Ni £ T\. 
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Fig. 8. Cal versus TV; and N± with optimal power allocation, xi = 0.16. 
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Fig. 9. Optimal Ni and Nt versus \\. 
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Fig. 10. C* AL versus xi- 
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